Adding Domain Specificity to an MT System
نویسندگان
چکیده
In the development of a machine translation system, one important issue is being able to adapt to a specific domain without requiring timeconsuming lexical work. We have experimented with using a statistical word-alignment algorithm to derive word association pairs (French-English) that complement an existing multipurpose bilingual dictionary. This word association information is added to the system at the time of the automatic creation of our translation pattern database, thereby making this database more domain specific. This technique significantly improves the overall quality of translation, as measured in an independent blind evaluation.
منابع مشابه
Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies
In this paper, we address the problem of machine translation (MT) of domain-specific texts for which large amounts of parallel data for training are not available. We focus on the IT domain and on English to Portuguese machine translation, and compare different strategies for improving system performance over two baselines, the first using only large dataset of out-of-domain data, and the secon...
متن کاملUse of Domain-Specific Language Resources in Machine Translation
In this paper, we address the problem of Machine Translation (MT) for a specialised domain in a language pair for which only a very small domain-specific parallel corpus is available. We conduct a series of experiments using a purely phrase-based SMT (PBSMT) system and a hybrid MT system (TectoMT), testing three different strategies to overcome the problem of the small amount of in-domain train...
متن کاملDomain-Specific Hybrid Machine Translation from English to Portuguese
Machine translation (MT) from English to Portuguese has not typically received much attention in existing research. In this paper, we focus on MT from English to Portuguese for the specific domain of information technology (IT), building a small in-domain parallel corpus to address the lack of IT-specific and publicly-available parallel corpora and then adapted an existing hybrid MT system to t...
متن کاملExploiting Multiple Resources for Japanese to English Patent Translation
This paper describes the development of a Japanese to English translation system using multiple resources and NTCIR-10 Patent translation collection. The MT system is based on different training data, the Wiktionary as a bilingual dictionary and Moses decoder. Due to the lack of parallel data on the patent domain, additional training data of the general domain was extracted from Wikipedia. Expe...
متن کاملA putative glutathione-binding site in CdZn-metallothionein identified by equilibrium binding and molecular-modelling studies.
Glutathione (GSH) has been found to form a complex with both vertebrate and invertebrate copper-metallothionein (CuMT) [Freedman, Ciriolo and Peisach (1989) J. Biol. Chem. 264, 5598-5605; Brouwer and Brouwer-Hoexum (1991) Arch. Biochem. Biophys. 290, 207-213]. In this paper we report on the interaction of GSH with CdZnMT-I and CdZnMT-II from rabbit liver and with CdMT-I from Blue crab hepatopan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001